23 research outputs found
Random Spiking and Systematic Evaluation of Defenses Against Adversarial Examples
Image classifiers often suffer from adversarial examples, which are generated
by strategically adding a small amount of noise to input images to trick
classifiers into misclassification. Over the years, many defense mechanisms
have been proposed, and different researchers have made seemingly contradictory
claims on their effectiveness. We present an analysis of possible adversarial
models, and propose an evaluation framework for comparing different defense
mechanisms. As part of the framework, we introduce a more powerful and
realistic adversary strategy. Furthermore, we propose a new defense mechanism
called Random Spiking (RS), which generalizes dropout and introduces random
noises in the training process in a controlled manner. Evaluations under our
proposed framework suggest RS delivers better protection against adversarial
examples than many existing schemes.Comment: To be appear in ACM CODESPY 202
Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks
Deep Convolutional Networks (DCNs) have been shown to be vulnerable to
adversarial examples---perturbed inputs specifically designed to produce
intentional errors in the learning algorithms at test time. Existing
input-agnostic adversarial perturbations exhibit interesting visual patterns
that are currently unexplained. In this paper, we introduce a structured
approach for generating Universal Adversarial Perturbations (UAPs) with
procedural noise functions. Our approach unveils the systemic vulnerability of
popular DCN models like Inception v3 and YOLO v3, with single noise patterns
able to fool a model on up to 90% of the dataset. Procedural noise allows us to
generate a distribution of UAPs with high universal evasion rates using only a
few parameters. Additionally, we propose Bayesian optimization to efficiently
learn procedural noise parameters to construct inexpensive untargeted black-box
attacks. We demonstrate that it can achieve an average of less than 10 queries
per successful attack, a 100-fold improvement on existing methods. We further
motivate the use of input-agnostic defences to increase the stability of models
to adversarial perturbations. The universality of our attacks suggests that DCN
models may be sensitive to aggregations of low-level class-agnostic features.
These findings give insight on the nature of some universal adversarial
perturbations and how they could be generated in other applications.Comment: 16 pages, 10 figures. In Proceedings of the 2019 ACM SIGSAC
Conference on Computer and Communications Security (CCS '19